Statistical learning: basics, R, and linear regression
MACS 30100
University of Chicago
February 8, 2017
What is statistical learning?

Why estimate \(f\)?
- Prediction
- Inference
- How do we estimate \(f\)?
- Parametric methods
- Non-parametric methods
Parametric methods
- First make an assumption about the functional form of \(f\)
- After a model has been selected, fit or train the model using the actual data
OLS

Parametric methods
\[Y = \beta_0 + \beta_{1}X_1\]
- \(Y =\) sales
- \(X_{1} =\) advertising spending in a given medium
- \(\beta_0 =\) intercept
- \(\beta_1 =\) slope
Non-parametric methods
- No assumptions about functional form
- Use data to estimate \(f\) directly
- Get close to data points
- Avoid overcomplexity
- Requires large amount of observations
LOESS

Statistical learning vs. machine learning
- Statistical learning
- Subfield of statistics
- Focused predominantly on inference
- Machine learning
- Subfield of computer science
- Focused predominantly on prediction
Why R?

Why R?
Things R does well
- Statistical analysis
- Data visualization
Things R does not do as well
Why are we not using Python?

But I don’t wanna!
Caveat emptor